electronic health record
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States (0.14)
- Asia > Middle East > Israel (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > South Korea (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Health & Medicine > Therapeutic Area (0.72)
- Health & Medicine > Health Care Technology > Medical Record (0.48)
Efficient Variance-reduced Estimation from Generative EHR Models: The SCOPE and REACH Estimators
Solo, Luke, McDermott, Matthew B. A., Parker, William F., Ramadan, Bashar, Burkhart, Michael C., Beaulieu-Jones, Brett K.
Generative models trained using self-supervision of tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction. This is typically done using Monte Carlo simulation for future patient trajectories. However, existing approaches suffer from three key limitations: sparse estimate distributions that poorly differentiate patient risk levels, extreme computational costs, and high sampling variance. We propose two new estimators: the Sum of Conditional Outcome Probability Estimator (SCOPE) and Risk Estimation from Anticipated Conditional Hazards (REACH), that leverage next-token probability distributions discarded by standard Monte Carlo. We prove both estimators are unbiased and that REACH guarantees variance reduction over Monte Carlo sampling for any model and outcome. Empirically, on hospital mortality prediction in MIMIC-IV using the ETHOS-ARES framework, SCOPE and REACH match 100-sample Monte Carlo performance using only 10-11 samples (95% CI: [9,11]), representing a ~10x reduction in inference cost without degrading calibration. For ICU admission prediction, efficiency gains are more modest (~1.2x), which we attribute to the outcome's lower "spontaneity," a property we characterize theoretically and empirically. These methods substantially improve the feasibility of deploying generative EHR models in resource-constrained clinical settings.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Health & Medicine > Therapeutic Area (1.00)
- Health & Medicine > Health Care Technology > Medical Record (0.87)
Communication-Efficient Federated Risk Difference Estimation for Time-to-Event Clinical Outcomes
Wang, Ziwen, Li, Siqi, Ong, Marcus Eng Hock, Liu, Nan
Privacy-preserving model co-training in medical research is often hindered by server-dependent architectures incompatible with protected hospital data systems and by the predominant focus on relative effect measures (hazard ratios) which lack clinical interpretability for absolute survival risk assessment. We propose FedRD, a communication-efficient framework for federated risk difference estimation in distributed survival data. Unlike typical federated learning frameworks (e.g., FedAvg) that require persistent server connections and extensive iterative communication, FedRD is server-independent with minimal communication: one round of summary statistics exchange for the stratified model and three rounds for the unstratified model. Crucially, FedRD provides valid confidence intervals and hypothesis testing--capabilities absent in FedAvg-based frameworks. We provide theoretical guarantees by establishing the asymptotic properties of FedRD and prove that FedRD (unstratified) is asymptotically equivalent to pooled individual-level analysis. Simulation studies and real-world clinical applications across different countries demonstrate that FedRD outperforms local and federated baselines in both estimation accuracy and prediction performance, providing an architecturally feasible solution for absolute risk assessment in privacy-restricted, multi-site clinical studies.
- Asia > Singapore (0.04)
- North America > United States (0.04)
- Europe > Norway > Norwegian Sea (0.04)
- Asia > China (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.88)
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images
Electronic Health Records (EHRs), which contain patients' medical histories in various multi-modal formats, often overlook the potential for joint reasoning across imaging and table modalities underexplored in current EHR Question Answering (QA) systems. In this paper, we introduce EHRXQA, a novel multi-modal question answering dataset combining structured EHRs and chest X-ray images. To develop our dataset, we first construct two uni-modal resources: 1) The MIMIC-CXR-VQA dataset, our newly created medical visual question answering (VQA) benchmark, specifically designed to augment the imaging modality in EHR QA, and 2) EHRSQL (MIMIC-IV), a refashioned version of a previously established table-based EHR QA dataset. By integrating these two uni-modal resources, we successfully construct a multi-modal EHR QA dataset that necessitates both uni-modal and cross-modal reasoning. To address the unique challenges of multi-modal questions within EHRs, we propose a NeuralSQL-based strategy equipped with an external VQA API. This pioneering endeavor enhances engagement with multi-modal EHR sources and we believe that our dataset can catalyze advances in real-world medical scenarios such as clinical decision-making and research.
Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching
Leach, Caroline N., Klusty, Mitchell A., Armstrong, Samuel E., Pickarski, Justine C., Hankins, Kristen L., Collier, Emily B., Shah, Maya, Mullen, Aaron D., Bumgardner, V. K. Cody
Screening patients for clinical trial eligibility remains a manual, time - consuming, and resource-intensive process. W e present a secure, scalable proof-of - concept system for Artificial Intelligence ( AI)- augmented patient - trial matching that addresses key implementation challenges: integrating heterogeneous electronic health record (EHR) data, facilitating expert review, and maintaining rigorous security standards. Leveraging open-source, reasoning-enabled large language models (LLMs), the system moves beyond binary classification to generate structured eligibility assessments with interpretable reasoning chains that support human-in - the - loop review. This decision support tool represents eligibility as a dynamic state rather than a fixed determination, identifying matches whe n available and offering actionable recommendations that could render a patient eligible in the future . The system aims to reduce coordinator burden, intelligently broaden the set of trials considered for each patient and guarantee comprehensive auditability of all AI - generated outputs. Introduction Applications of artificial intelligence (AI) in healthcare are increasingly focused on improving administrative efficiency and optimizing clinical workflows . Identifying relevant trials and screening them for a particular patient is traditionally manual, time - consuming, and heavily reliant on clinical expertise.
- North America > United States > Kentucky > Fayette County > Lexington (0.14)
- Europe > Italy > Abruzzo (0.04)
- Asia > Middle East > Jordan (0.04)
Rethinking Tokenization for Clinical Time Series: When Less is More
Attrach, Rafi Al, Fani, Rajna, Restrepo, David, Jia, Yugang, Schüffler, Peter
Tokenization strategies shape how models process electronic health records, yet fair comparisons of their effectiveness remain limited. We present a systematic evaluation of tokenization approaches for clinical time series modeling using transformer-based architectures, revealing task-dependent and sometimes counterintuitive findings about temporal and value feature importance. Through controlled ablations across four clinical prediction tasks on MIMIC-IV, we demonstrate that explicit time encodings provide no consistent statistically significant benefit for the evaluated downstream tasks. Value features show task-dependent importance, affecting mortality prediction but not readmission, suggesting code sequences alone can carry sufficient predictive signal. We further show that frozen pretrained code encoders dramatically outperform their trainable counterparts while requiring dramatically fewer parameters. Larger clinical encoders provide consistent improvements across tasks, benefiting from frozen embeddings that eliminate computational overhead. Our controlled evaluation enables fairer tokenization comparisons and demonstrates that simpler, parameter-efficient approaches can, in many cases, achieve strong performance, though the optimal tokenization strategy remains task-dependent.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.68)